Equatorial Guinea
Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models
Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing
As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.
- North America > Cuba (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Syria (0.14)
- (185 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (0.67)
- Government > Regional Government > Asia Government > Middle East Government (0.46)
10 captivating images from National Geographic's Photo Ark
Since 2006, the project has photographed 17,000 species in the world's zoos, aquariums, and wildlife sanctuaries. Photographs from the Photo Ark will be featured in the inaugural exhibition at the National Geographic Museum of Exploration in Washington D.C. Breakthroughs, discoveries, and DIY tips sent every weekday. A picture is said to be worth a thousand words, but some photographs are worth 17,000. Well, 17,000 species, that is. For's Photo Ark project, photographer Joel Sartore is documenting all species living in the world's zoos, aquariums, and wildlife sanctuaries.
- North America > United States > District of Columbia > Washington (0.27)
- Africa > Equatorial Guinea > Gulf of Guinea > Bioko Island > Bioko Norte > Malabo (0.05)
The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
Wasmuht, Dante Francisco, Brookes, Otto, Schall, Maximillian, Palencia, Pablo, Beirne, Chris, Burghardt, Tilo, Mirmehdi, Majid, Kühl, Hjalmar, Arandjelovic, Mimi, Pottie, Sam, Bermant, Peter, Asheim, Brandon, Toh, Yi Jin, Elzinga, Adam, Holmberg, Jason, Whitworth, Andrew, Flatt, Eleanor, Gustafson, Laura, Ryali, Chaitanya, Hu, Yuan-Ting, Guo, Baishan, Westbury, Andrew, Saenko, Kate, Suris, Didac
Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable across wild animal populations. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in ~46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision-only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing generalizable multianimal tracking in the wild. The dataset is available at https://www.conservationxlabs.com/sa-fari.
- South America (0.14)
- Africa > Uganda (0.04)
- Africa > Kenya (0.04)
- (7 more...)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (98 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education > Health & Safety > School Nutrition (0.93)
- Health & Medicine > Consumer Health (0.93)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Equatorial Guinea > Gulf of Guinea (0.04)
Language Specific Knowledge: Do Models Know Better in X than in English?
Agarwal, Ishika, Bozdag, Nimet Beyza, Hakkani-Tür, Dilek
Often, multilingual language models are trained with the objective to map semantically similar content (in different languages) in the same latent space. In this paper, we show a nuance in this training objective, and find that by changing the language of the input query, we can improve the question answering ability of language models. Our contributions are two-fold. First, we introduce the term Language Specific Knowledge (LSK) to denote queries that are best answered in an "expert language" for a given LLM, thereby enhancing its question-answering ability. We introduce the problem of language selection -- for some queries, language models can perform better when queried in languages other than English, sometimes even better in low-resource languages -- and the goal is to select the optimal language for the query. Second, we introduce simple to strong baselines to test this problem. Additionally, as a first-pass solution to this novel problem, we design LSKExtractor to benchmark the language-specific knowledge present in a language model and then exploit it during inference. To test our framework, we employ three datasets that contain knowledge about both cultural and social behavioral norms. Overall, LSKExtractor achieves up to 10% relative improvement across datasets, and is competitive against strong baselines, while being feasible in real-world settings. Broadly, our research contributes to the open-source development (https://github.com/agarwalishika/LSKExtractor/tree/main) of language models that are inclusive and more aligned with the cultural and linguistic contexts in which they are deployed.
- Asia > Laos (0.28)
- Asia > South Korea (0.14)
- Asia > North Korea (0.14)
- (182 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.
- North America > Canada > Alberta (0.14)
- Europe > Austria > Vienna (0.14)
- Africa > Sudan (0.14)
- (53 more...)
- Health & Medicine (1.00)
- Education (0.67)
- Information Technology (0.67)
AI Diffusion in Low Resource Language Countries
Misra, Amit, Zamir, Syed Waqas, Hamidouche, Wassim, Becker-Reshef, Inbal, Ferres, Juan Lavista
Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.
- North America > The Bahamas (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- South America > Venezuela (0.04)
- (186 more...)
Impact of clinical decision support systems (cdss) on clinical outcomes and healthcare delivery in low- and middle-income countries: protocol for a systematic review and meta-analysis
Jain, Garima, Bodade, Anand, Pati, Sanghamitra
Clinical decision support systems (CDSS) are used to improve clinical and service outcomes, yet evidence from low- and middle-income countries (LMICs) is dispersed. This protocol outlines methods to quantify the impact of CDSS on patient and healthcare delivery outcomes in LMICs. We will include comparative quantitative designs (randomized trials, controlled before-after, interrupted time series, comparative cohorts) evaluating CDSS in World Bank-defined LMICs. Standalone qualitative studies are excluded; mixed-methods studies are eligible only if they report comparative quantitative outcomes, for which we will extract the quantitative component. Searches (from inception to 30 September 2024) will cover MEDLINE, Embase, CINAHL, CENTRAL, Web of Science, Global Health, Scopus, IEEE Xplore, LILACS, African Index Medicus, and IndMED, plus grey sources. Screening and extraction will be performed in duplicate. Risk of bias will be assessed with RoB 2 (randomized trials) and ROBINS-I (non-randomized). Random-effects meta-analysis will be performed where outcomes are conceptually or statistically comparable; otherwise, a structured narrative synthesis will be presented. Heterogeneity will be explored using relative and absolute metrics and a priori subgroups or meta-regression (condition area, care level, CDSS type, readiness proxies, study design).
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Wu, Jiageng, Gu, Bowen, Zhou, Ren, Xie, Kevin, Snyder, Doug, Jiang, Yixing, Carducci, Valentina, Wyss, Richard, Desai, Rishi J, Alsentzer, Emily, Celi, Leo Anthony, Rodman, Adam, Schneeweiss, Sebastian, Chen, Jonathan H., Romero-Brufau, Santiago, Lin, Kueiyu Joshua, Yang, Jie
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, benchmarking on large-scale real-world data such as electronic health records (EHRs) is critical, as clinical decisions are directly informed by these sources, yet current evaluations remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world clinical data. Others focus narrowly on specific application scenarios, limiting their generalizability across broader clinical use. To address this gap, we present BRIDGE, a comprehensive multilingual benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages. It covers eight major task types spanning the entire continuum of patient care across six clinical stages and 20 representative applications, including triage and referral, consultation, information extraction, diagnosis, prognosis, and billing coding, and involves 14 clinical specialties. We systematically evaluated 95 LLMs (including DeepSeek-R1, GPT-4o, Gemini series, and Qwen3 series) under various inference strategies. Our results reveal substantial performance variation across model sizes, languages, natural language processing tasks, and clinical specialties. Notably, we demonstrate that open-source LLMs can achieve performance comparable to proprietary models, while medically fine-tuned LLMs based on older architectures often underperform versus updated general-purpose models. The BRIDGE and its corresponding leaderboard serve as a foundational resource and a unique reference for the development and evaluation of new LLMs in real-world clinical text understanding. The BRIDGE leaderboard: https://huggingface.co/spaces/YLab-Open/BRIDGE-Medical-Leaderboard
- North America > United States > Illinois > Champaign County > Urbana (0.13)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (74 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)